AITopics | neural policy

Collaborating Authors

neural policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dynamic Model Predictive Shielding for Provably Safe Reinforcement Learning

Neural Information Processing SystemsDec-27-2025, 01:25:02 GMT

Among approaches for provably safe reinforcement learning, Model Predictive Shielding (MPS) has proven effective at complex tasks in continuous, high-dimensional state spaces, by leveraging a to ensure safety when the learned policy attempts to take risky actions. However, while MPS can ensure safety both during and after training, it often hinders task progress due to the conservative and task-oblivious nature of backup policies.This paper introduces (DMPS), which optimizes reinforcement learning objectives while maintaining provable safety. DMPS employs a local planner to dynamically select safe recovery actions that maximize both short-term progress as well as long-term rewards. Crucially, the planner and the neural policy play a synergistic role in DMPS. When planning recovery actions for ensuring safety, the planner utilizes the neural policy to estimate long-term rewards, allowing it to beyond its short-term planning horizon. Conversely, the neural policy under training learns from the recovery plans proposed by the planner, converging to policies that are both and in practice.This approach guarantees safety during and after training, with bounded recovery regret that decreases exponentially with planning horizon depth. Experimental results demonstrate that DMPS converges to policies that rarely require shield interventions after training and achieve higher rewards compared to several state-of-the-art baselines.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

NetHack is Hard to Hack

Neural Information Processing SystemsDec-26-2025, 03:12:23 GMT

Neural policy learning methods have achieved remarkable results in various control problems, ranging from Atari games to simulated locomotion.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

Neural Information Processing SystemsDec-26-2025, 01:27:17 GMT

Learning to walk over a graph towards a target node for a given query and a source node is an important problem in applications such as knowledge base completion (KBC). It can be formulated as a reinforcement learning (RL) problem with a known state transition model. To overcome the challenge of sparse rewards, we develop a graph-walking agent called M-Walk, which consists of a deep recurrent neural network (RNN) and Monte Carlo Tree Search (MCTS). The RNN encodes the state (i.e., history of the walked path) and maps it separately to a policy and Q-values. In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

Neural Information Processing SystemsNov-20-2025, 22:58:27 GMT

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

Imitation-Projected Programmatic Reinforcement Learning

Abhinav Verma, Hoang Le, Yisong Yue, Swarat Chaudhuri

Neural Information Processing SystemsNov-16-2025, 14:43:58 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (0.46)

Industry:

Education (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Safely Imitating a Neural Policy

Neural Information Processing SystemsNov-13-2025, 22:38:05 GMT

Here we provide proofs of the theoretical results from Section 3.2 and extend the discussion of a few

artificial intelligence, benchmark, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Transportation (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Graph Attention-Guided Search for Dense Multi-Agent Pathfinding

Jain, Rishabh, Okumura, Keisuke, Amir, Michael, Prorok, Amanda

arXiv.org Artificial IntelligenceOct-21-2025

Finding near-optimal solutions for dense multi-agent pathfinding (MAPF) problems in real-time remains challenging even for state-of-the-art planners. To this end, we develop a hybrid framework that integrates a learned heuristic derived from MAGAT, a neural MAPF policy with a graph attention scheme, into a leading search-based algorithm, LaCAM. While prior work has explored learning-guided search in MAPF, such methods have historically underperformed. In contrast, our approach, termed LaGAT, outperforms both purely search-based and purely learning-based methods in dense scenarios. This is achieved through an enhanced MAGAT architecture, a pre-train-then-fine-tune strategy on maps of interest, and a deadlock detection scheme to account for imperfect neural guidance. Our results demonstrate that, when carefully designed, hybrid search offers a powerful solution for tightly coupled, challenging multi-agent coordination problems.

artificial intelligence, lacam, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2510.17382

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Japan (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

A Safely Imitating a Neural Policy

Neural Information Processing SystemsOct-9-2025, 14:09:55 GMT

Here we provide proofs of the theoretical results from Section 3.2 and extend the discussion of a few

artificial intelligence, benchmark, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Transportation (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Imitation-Projected Programmatic Reinforcement Learning

Abhinav Verma, Hoang Le, Yisong Yue, Swarat Chaudhuri

Neural Information Processing SystemsOct-2-2025, 19:23:14 GMT

However, such a distillation process can yield a highly suboptimal programmatic policy -- i.e., a large

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Florida > Pinellas County > St. Petersburg (0.04)
(4 more...)

Genre: Research Report (0.46)

Industry:

Education (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Common Benchmarks Undervalue the Generalization Power of Programmatic Policies

Rajabpour, Amirhossein, Aghakasiri, Kiarash, Zilles, Sandra, Lelis, Levi H. S.

arXiv.org Artificial IntelligenceJun-18-2025

Algorithms for learning programmatic representations for sequential decision-making problems are often evaluated on out-of-distribution (OOD) problems, with the common conclusion that programmatic policies generalize better than neural policies on OOD problems. In this position paper, we argue that commonly used benchmarks undervalue the generalization capabilities of programmatic representations. We analyze the experiments of four papers from the literature and show that neural policies, which were shown not to generalize, can generalize as effectively as programmatic policies on OOD problems. This is achieved with simple changes in the neural policies training pipeline. Namely, we show that simpler neural architectures with the same type of sparse observation used with programmatic policies can help attain OOD generalization. Another modification we have shown to be effective is the use of reward functions that allow for safer policies (e.g., agents that drive slowly can generalize better). Also, we argue for creating benchmark problems highlighting concepts needed for OOD generalization that may challenge neural policies but align with programmatic representations, such as tasks requiring algorithmic constructs like stacks.

agent, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.14162

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback